Psychological Bulletin 
1984, Vol. 95, No. 2, 179-188 


Copyright 1984 nae the 
American Psychological Association, Inc, 


Effectiveness of Coaching for Aptitude Tests 


James A. Kulik, Robert L. Bangert-Drowns, and Chen-Lin C. Kulik 
University of Michigan 


Reviews written during the 1950s reported that coaching programs were usually 
effective in raising scores on aptitude and intelligence tests. However, more recent 
reviews, focusing on the Scholastic Aptitude Test (SAT), have reported only small 
and insignificant effects from coaching. The meta-analytic investigation described 
in this article confirmed that coaching programs have had different effects on the 
SAT and other aptitude tests. In 14 studies on the SAT, coaching raised scores by 
an average of 0.15 standard deviations; in 24 studies on other aptitude and intelligence 
tests, coaching raised scores by an average of 0.43 standard deviations. Studies that 
used pretests reported stronger coaching effects than did studies with posttest-only 
designs. Other study features were not significantly related to outcomes of the 


coaching studies. 


Aptitude testing is a multi-million-dollar 
industry that plays an important role in 
American education. Every year millions of 
students take such tests as the Scholastic Ap- 
titude Test (SAT), the Graduate Record Ex- 

‘amination, and the Law School Admissions 
Test; their lives are greatly affected by the re- 
sults. In recent years a coaching industry has 
evolved, in the shadow of the testing industry, 
which offers coaching and “crash” courses to 
help students improve their chances of scoring 
high on admissions and aptitude tests, The 
coaching industry is made up of at least 150 
independent firms and offers services for more 
than 50,000 students annually (“Coaching 
Daze,” 1979), Still other “cram” courses are 
offered by conventional schools for their own 
students. One third of the public and private 
schools in the Northeast, for example, offer 
some sort of SAT preparation course. 

The testing and coaching industries embody 
different beliefs about aptitude testing, Ac- 
cording to the testers, aptitude tests measure 


The material in this report is based on work supported 
by the National Institute of Education under Grant G 81 
0047. Any opinions, findings, and conclusions or rec- 
ommendations expressed in this report are those of the 
authors and do not necessarily reflect the views-of the 
National: Institute of Education, 

The authors extend their special thanks to Michael 
Masnari and Mary Spirito for their invaluable assistance 
in locating studies and coding study features. 

Requests for reprints should be sent to James A, Kulik, 
Center for Research on Learning and Teaching, University 
of Michigan, 109 E. Madison St., Ann Arbor, Michigan 
48109, 


capacities developed gradually from in-school 
and out-of-school experiences. The testing in- 
dustry has long held that short-term coaching 
is at best likely to yield insignificant increases 
on aptitude test scores (College Entrance Ex- 
amination Board, 1968, p. 8). The coaching 
industry, on the other hand, maintains that 
aptitude test scores can.be raised by test fa- 
miliarization, drill and practice, instruction in 
test-taking strategy, and highly focused content 
teaching. According to those who offer coach- 
ing services, aptitude tests—like achievement 
tests—measure content learning, but the con- 
tent covered on aptitude tests is more esoteric 
and less likely to be covered in conventional 
school classes. : 
’ The controversy between those who coach 
students for tests and those who develop the 
tests has-important implications for testing 
policy. If coaching works, then the testers may 
have to revise claims about their products, give 
different advice to students who are about to 
take tests, and confront the important issue 
of equal access to coaching programs. If 
coaching does not work, then the coaching 
industry is misleading its clients, and conven- 
tional schools are corrupting their curricula 
with crash courses unworthy of the schools 
and wasteful of students’ and teachers’ time. 
No single study of coaching has resolved 
the controversy. Studies of coaching have been 
carried out in different settings, with different 
research designs, and with different results. 
Some studies have yielded positive results 
(Evans & Pike, 1973), and others have yielded 


179 


180 


negative findings (Keefauver, 1977). In one 
highly publicized study investigating the ef- 
fectiveness of SAT preparation at two schools, 
coaching was ineffective at one school, but 
contributed about 25 points to scores on both 
the verbal and mathematical SAT examina- 
tions at the other school (Federal Trade Com- 
mission, 1979). The type of coaching program 
seemed to be more significant than the simple 
fact of coaching. 

The issue of coaching effects is the cause of 
much controversy among reviewers. The first 
major reviews were reported in England. Gen- 
erally, they concluded that formal and informal 
coaching had a significant influence on test 
performance. Vernon (1954), summarizing 
empirical findings and opinions of a number 
of British experts at a symposium on coaching, 
revealed that the average effect of coaching 
and practice was to increase IQ scores by 8 
to 9 points, or by nearly 0.6 standard devia- 
tions. Vernon pointed out that such an effect 
could be achieved in a remarkably short time, 
usually between three and-nine hours. Ac- 
cording to Vernon (1954), larger amounts of 
coaching, at home or at school, were futile and 
undesirable. 

More recent reviews have focused on the 
widely used SAT. A few of these reviews have 
concluded that well-designed coaching pro- 
grams can lead to moderate changes in SAT 
scores (Pike, 1979; Slack & Porter, 1980). 
However, other reviews have emphasized the 
futility of coaching for this exam. The trustees 
of the College Board, for example, stated that 
the average increase expected from intensive 
coaching on the SAT is fewer than 10 points 
on the SAT score scale of 200 to 800 (College 
Entrance Examination Board, 1968, p. 4). In 
a recent review, two Educational Testing Ser- 
vice (ETS) researchers drew conclusions in a 
similar vein: ““The time required to achieve av- 
erage score increases much greater than 20 to 
30 points (on a 200- to 800-point scale) rapidly 
approaches that of full-time schooling” (Mes- 
sick & Jungeblut, 1981, p. 215). 

There are at least two reasons for the in- 
consistency in conclusions about the effects of 
coaching. First, reviewers have not examined 
the same studies. There is no overlap, for ex- 
ample, in studies cited by Vernon (1954) and 
Messick and Jungeblut (1981). Each review 
examined only a portion of the available stud- 


J. KULIK, R. BANGERT-DROWNS, & C.-L. KULIK 


ies. Second, reviewers have not generally an- 
alyzed the accumulated study results with 
quantitative and statistical methods. The use 
of such objective methods forces reviewers to 
put their notions to quantitative test. Reviewers 
who use less formal methods often see what 
they expect to in a collection of results. 

This review is meant to complement earlier 
reviews of coaching effects. It covers a large 
group of coaching studies, and its conclusions 
are based on a quantitative analysis, or meta- 
analysis, of the accumulated research findings. 
The specific questions that the article addresses 
are these: How effective is the typical coaching 
program? Under which conditions are coach- 
ing programs most effective? Are there certain 
features of programs that increase the likeli- 
hood of their success? 


Method 


The meta-analytic approach used in this review is similar 
to that described by Glass, McGaw, and Smith (1981). 
Their approach requires a reviewer to locate studies of an 
issue through objective and repeatable searches, to code 
the studies for salient features, and to describe study out- 
comes on a common scale. 


Data Sources 


To find studies of coaching effectiveness, we carried out 
manual searches of four library data bases: Research in 
Education, the data base containing reports available from 
the Educational Resources Information Center (ERIC); 
Current Index to Journals in Education; Comprehensive 
Dissertation Abstracts; and Psychological Abstracts. The 
bibliographies and articles located through these data bases 
provided a second source of studies for the meta-analysis. 

To qualify for use in this meta-analysis, studies had to 
meet several explicit criteria. The most important was that 
the study involved a true test-coaching program, not a 
program of practice or tutoring. In a true coaching pro- 
gram, students are told how to answer specific types of 
test questions and are given suggestions on how to improve 
their test performance. In a program of simple practice, 
on the other hand, students learn from their own experience 
by taking an alternate form of a test under standard con- 
ditions; there is no real teaching. In a tutoring program, 
students receive direct instruction from peer- or cross-age 
tutors, but the focus is on improving academic skills, espe- 
cially in reading and mathematics, not on improving per- 
formance on a specific test. 

The other major criteria set up for studies covered their 
methodological adequacy. To be included in the analyses, 
studies had to report results in quantitative terms. Size of 
effects could not be calculated from anecdotal and im- 
pressionistic studies. In addition, results had to be available 
from an uncoached control group as well as from a coached 
group. In studies without control groups, apparent effects 


APTITUDE COACHING 


of coaching may be due to growth, simple practice on a 
‘pretest, or other factors. ™ 
A few further rules ensured that no individual paper 
or study exerted a disproportionate influence on overall 
results. When a single paper reported findings from several 
different instruments or populations, we pooled the results 
from the instruments or populations to obtain a composite 
finding. When a paper examined transfer effects on a test 
or item-type not covered directly in a coaching program 
(e.g., Evans & Pike, 1973), we coded only results from the 
target test or item-type and ignored the transfer effects. 
When a single paper reported results from two different 
coaching programs, however, we treated the paper as a 
report on two studies. Investigators whose papers fell into 
this category were: Alderman and Powers (1980), who 
reported results from coaching programs of moderate 
length (5 to 11 hr) and from a long-term coaching program 
(45 hr); French (1955), who presented results from distinct 
programs of “SAT coaching” and “vocabulary training”; 
and the Federal Trade Commission's Bureau of Consumer 
Protection (1979), which presented results from different 
programs developed by the Stanley H. Kaplan Educational 
Center and by the Test Preparation Center. 


Characteristics of Studies 


Twenty variables were used to describe the main features 
of the studies, These variables covered characteristics of 
the coaching programs, the methodologies used in the 
experiments, the subjects employed, and the publications 
in which studies were reported. 

Seven variables were used to describe the coaching pro- 
grams, The first of these variables classified the programs 
into three levels of training intervention originally described 
by Anastasi (1981): (a) short test-taking orientation and 
practice sessions; (b) longer coaching programs that include 
intensive, concentrated drill or “cramming” on sample 
test questions; and (c) instruction in broad cognitive skills, 
The second of the variables gave the coaching time in 
hours for the programs, and the third specified whether 
the coaching was provided by a commercial coaching firm 
ot through a school. The next four variables specified 
whether the following components were present or absent 
in the coaching programs: training in test wiseness strat- 
egies; anxiety-reduction exercises; actual practice on test 
items; and direct content teaching, 

Nine variables described methodological characteristics 
of the studies: random versus nonrandom assignment of 
subjects to experimental and control groups; pretest versus 
posttest-only design; artificial versus real testing situation; 
locally constructed versus standardized test; ETS-sponsored 
versus other research; new versus field-tested coaching 
program; coaching for SAT versus other aptitude test; 
coaching for group versus individual test; and coaching 
for a full test versus a subtest. 

Characteristics of subjects were described in two vari- 
ables: grade level and ability level. Finally, publication 
characteristics were described in two additional variables: 
source of the study (journal article, dissertation, or ERIC 
report) and the year of the report. 


Study Outcomes 


The metric that we used to describe coaching outcomes 
was the effect size (E'S), defined as the difference between 


181 


average test scores of the experimental and control groups, 
divided by the standard deviation of the test. The method 
used in calculating ES varied with the information given 
in the-study. For studies reporting means and standard 
deviations for both experimental and control groups, we 
calculated ESs from the measurements provided. When 
both pretest and posttest standard deviations were given 
in such studies, pretest standard deviations were used in 
the calculation of ESs. Pretest standard deviations were 
used to estimate the population standard deviation because 
pretest variation was uninfluenced by the experimental 
treatment or by practice: posttest variation might be in- 
fluenced by such factors. In well-reported studies that did 
not use pretests, the standard deviation of the control group 
was used in the calculation of ES because the control 
group standard deviation was unaffected by the experi- 
mental treatment. For studies that did not report means 
and standard deviations, we calculated ES from statistics 
such as ¢ and F, using procedures described by Glass, 
McGaw, and Smith (1981). 


Results 


The search procedures yielded 35 separate 
reports that met the criteria established for use 
in this meta-analysis. These 35 reports con- 
tained results from 38 different studies (Ta- 
ble 1). 


Overall Effects 


In 35 of the 38 studies, the coaching pro- 
gram had a positive effect on test performance; 
in three studies, the effect of coaching was 
negative. Of the 35 studies with positive results, 
the effect of coaching was reported significant 
in 25; no study reported a significant negative 
effect. These box score results support the gen- 
eralization that coaching has a positive effect 
on aptitude test performance. 

How large is this effect? Thirty-one studies 
used a pretest—posttest design, and 20 of these 
reported average scores separately on pretest 
and posttest for experimental and control 
groups. Improvement from pretest to posttest 
in these 20 studies averaged 0.64 standard de- 
viations for the coached groups. This im- 
provement reflected the combined influence 
of both the coaching program and practice on 
a pretest. Improvement from pretest to posttest 
for the control groups in the studies averaged 
0.24 standard deviations and reflected the in- 
fluence of a single practice test taken without 
a coaching intervention. The effect of coaching 
alone, estimated from these 20 studies, would 
be 0.64 — 0.24, or 0.40 standard deviations. 


182 J. KULIK, R. BANGERT-DROWNS, & C.-L. KULIK 


The seven studies that used a posttest-only of experimental and control groups in these 
design provided a somewhat smaller estimate studies differed on the average by 0.27 standard 
of the size of the coaching effect. Posttest scores deviations. 





Table | 
Major Features of 39 Studies of Coaching for Aptitude Tests 
Grade Level of Contact Effect 
Study Test level coaching hours size 
Alderman & Powers (1980) 
Study 1 SAT-V 11 2 8.5 0.08 
Study 2 SAT-V 11 3 45.0 0.12 
Bernal (1971) Teacher-modified 8 1 0.7 0.40 
Boger (1952) Otis Quick-Scoring & 2 1 13.6 0.45 
California Test of 
Mental Maturity 
Casey, Davidson, & Horter (1928) Teacher-made 3 2 9.0 0.74 
Dear (cited in French & Dear, SAT-M 12 2 12.0 0.26 
1959) 
Dyer (cited in French & Dear, SAT-M 12 2 15,2 0.10 
1959) 
Evans (1977) GRE-Q 15 2 12.0 0.26 
Evans & Pike (1973) SAT-M 11 2 21.0 0.52 
Federal Trade Commission (1979) 
Study | SAT 11 2 40.0 0.31 
Study 2 SAT 11 2 24.0 0.07 
Flynn & Anderson (1977) Thurstone Test of 14 1 0.2 0.02 
Mental Alertness 
Frankel (1960) SAT 12 2 30.0 0.10 
French (1955) 
Study | SAT 12 2 17.0 0.17 
Study 2 SAT-V 12 1 4.5 0.05 
Gilmore (1927) Otis 14 2 0.71 
Goldsmith (1980) DAT-V 10 1 2.5 0.66 
Greene (1928) Stanford-Binet 2 2 2.0 0.57 
Holloway (1954) Primary Mental K 2 0.53 
Abilities & WISC 
Jefferson (1975) Otis-Lennon 17 1 3.0 0.70 
Keefauver (1977) SAT 12 2 14,2 0.01 
Keysor (1977) MCAT 15 i 5.0 0.18 
Kintisch (1979) SAT-V 12 3 33.3 0.14 
Klutch (1976) DAT 8 1 4,2 0.43 
Lent & Russell (1978) Teacher-made 13 1 0.44 
Lewis & Kuske (1978) National Medical 18 3 —0.06 
Board 
Melametsa (1965) Teacher-made 8 2 4.0 0.84 
Merriman (1927) Thorndike College 15 2 6.0 0.40 
Entrance Exam , 
Moore (1971) Teacher-made 17 1 0.5 0.78 
Oakland (1972) Metropolitan Readiness 1 1 6.0 0.46 
Test 
Petty & Harrell (1977) Otis-Lennon 6 1 1.0 0.23 
Rayford (1973) Lorge-Thorndike 8 1 0.5 —0.01 
Roberts & Oppenheim (1966) PSAT 11 2 7.5 0.12 
Rutan (1979) DAT 9 1 6.0 0.57 
Trainor (1939) - Detroit Intelligence Test 14 3 0.45 
Whitely & Dawis (1974) Teacher-made 10 2 1.0 0.43 
Whitla (1962) SAT 12 2 10.0 0.03 
Wiseman & Wrigley (1953) Moray House 5 2 6.0 0.13 


Note. DAT = Differential Aptitude Test, GRE = Graduate Record Examination, MCAT = Medical College Admissions 
Test; PSAT = Preliminary Scholastic Aptitude Test; SAT = Scholastic Aptitude Test; WISC = Wechsler Intelligence 
Scale for Children. The modifiers M, Q, and V denote the mathematical, quantitative, and verbal parts, respectively, 
of these instruments. 


APTITUDE COACHING 


On the basis of all 38 studies, we estimated 
the average ES of coaching to be 0.33. It is 
statistically unlikely that an ES as large as 0.33 
would ‘be found if the literature showed no 
overall effect of coaching, (37) = 8.12, p< 
001. An ES of 0.33 implies that coaching will 
raise a typical student’s score from the 50th 
to the 63rd percentile. On an aptitude test 
with a population mean of 100 and a standard 
deviation of 15, the effect of a coaching pro- 
gram alone will be to raise the typical score 
from 100 to 105. On an aptitude test with a 
population mean of 500 and a population 
standard deviation of 100, the effect will be 
to raise the typical score from 500 to 533. 


Heterogeneity of Studies 


The distribution of ESs was multimodal in 
shape (Figure 1). One of the modes was at 0.1 
standard deviations; another was at 0.4 stan- 
dard deviations. Further examination of the 
data showed that studies of coaching for the 
SAT were clustered tightly at the smaller mode; 
other studies were spread somewhat more 
loosely around the larger mode (Figure 2). 
Coaching programs for the SAT thus seemed 
to have different effects from coaching pro- 
grams for other tests. 

Compared with other studies, SAT studies 
were significantly more likely to involve long- 
term coaching, field-tested coaching programs, 
coaching by a commercial school, testing for 
a real-life educational decision, higher grade 
levels, pre- and postresearch designs, and re- 
search carried out by ETS. A factor analysis 


10, 
9 


8 


FREQUENCY 


~.2 0 12 4 .6 .8 
* EFFECT SIZE 


Figure 1. Distribution of coaching effects for 39 studies. 


183 


10 






SAT STUDIES 








9 











8 ' pemea OTHER STUDIES 
: A 
q 
» 6 a4 
a oy 
z a 4 
wis foo 
2 a 1. 
i jp 8 
eu HT q 
L oa) (oY 
a a 
3 p § 
fi] 
a9 4 
2 ei 
ff 
1 ] 
0 
-.2 0 .2 4 6 8 
EFFECT SIZE 


Figure 2. Distribution of coaching effects for 15 Scholastic 
Aptitude Test (SAT) studies and 24 studies of other aptitude 
tests, 


of the study features, in fact, yielded a strong 
first factor on which target test (SAT vs. other) 
for the coaching program had the highest 
loading. Other variables with high loadings on 
this factor included type of coaching program 
(short vs. long term); novelty of the coaching 
program (new vs. field tested); location of the 
program (commercial vs. school based); and 
the realism of the test situation (laboratory 
test vs. actual test). 

Because the SAT studies were different from 
other studies both in their features and in their 
outcomes, analysis of data from the total group 
would most likely produce misleading results. 
Consequently, any study feature that was 
highly characteristic of the SAT studies would 
appear to have an influence on size of effect, 
whether or not the feature was related to size 
of effect with target test held constant. To guard 
against misleading results, we carried out all 
further analyses separately on SAT studies and 
on studies of other aptitude tests. 


Coaching for the SAT 


Each of the 14 studies of SAT coaching em- 
ployed both pretests and posttests. Improve- 
ment from pretest to posttest averaged 0.36 
standard deviations for the experimental 
groups and 0.21 standard deviations for the 
control groups. The effect of coaching alone, 
estimated from the 14 studies, was equal to 
0.36 — 0.21, or 0.15. It is statistically unlikely 
that an ES as large as 0.15 would be found 


184 J. KULIK, R. BANGERT-DROWNS, & C.-L. KULIK 





Table 2 
Means and Standard Errors of Effect Sizes For Different Categories of Studies 
SAT studies Other studies 
Effect size Effect size 
Categories N M SE N M 

Level of intervention 

1. Short orientation 1 05 13 Al 

2. Drill on test i 16 05 9 1 

3. Broad skill training 2 13 01 2 19 
Duration 

Short (less than 3 hr) 0 8 38 

Average (3 to 9 hr) 3 08 02 9 49 

Long (more than 9 hr) li 16 05 2 35 
Coacher 

Commercial 4 13 .06 1 —.06 

School & other 10 16 .05 23 45 
Test-wiseness component 

Yes 5 .20 10 12 39 

No 9 12 02 12 AT 
Anxiety-reduction component 

Yes 0 3 37 

No 14 15 .04 21 44 
Practice component 

Yes 13 16 04 15 46 

No 1 03 9 38 
Content component 

Yes 8 16 06 8 42 

No 6 13 .04 16 43 
ETS research 

Yes 8 18 05 1 26 

No 6 ll 05 23 44 
Novelty 

Yes 6 16 08 19 46 

No 8 14 03 5 31 
SAT 

Yes 14 15 04 0 

No 0 24 43 
Group test 

Individual 0 3 50 

Group 14 AS 04 21 42 
Subtest ' 

General 13 12 02 19 39 

Subtest 1 52 5 58 
Group assignment 

Random 4 21 10 10 .40 

Nonrandom 10 12 .03 14 45 
Use of pretest* 

Pretest—posttest design 14 15 04 17 50 

Posttest only design 0 7 27 
Realism 

Actual test situation 10 12 .03 5 32 

Laboratory study 4 21 10 19 46 
Test construction 

Teacher-made or modified 0 10 47 

Standardized 14 AS 04 14 40 
Grade level of subjects 

K-6 0 7 44 

7-12 14 15 .04 7 AT 

13-18 0 10 39 
Ability level of subjects 

Low 4 .20 ae 4 39 


SE 


07 
08 
25 
10 
.08 
09 
05 


08 
07 


07 
06 
.09 
09 
07 
05 
06 
All 
05 


17 
06 


06 
10 


.09 
.06 


05 
12 


AS 
05 


.08 
07 


08 
10 
09 


APTITUDE 


COACHING 185 
Table 2 (continued) 
SAT studies Other studies 
Effect size Effect size 
Categories N M SE N M SE 
Middle 7 15 04 11 48 06 
High 3 06 04 9 38 10 
Nature of publication : 
Published 10 16 05 15 44 07 
Unpublished 4 12 07 9 40 08 
Year of publication 
Before 1960 5 4 04 8 50 07 
1960-1975 4 .20 ll 8 50 10 
After 1975 5 Al 05 8 29 09 


Note, ETS = Educational Testing Service; SAT = Scholastic Aptitude Test. 
* Diffference in ESs for categories of this variable significantly greater than zero (p < .05) for studies of tests other 


than the SAT. 


if the literature showed no overall effect of 
coaching, ¢(13) = 4.08, p < .01. 

None of the study features was significantly 
related to ES in the SAT studies (Table 2). 
Effects were similar for SAT coaching pro- 
grams of different durations and with different 
characteristics. Findings were also similar in 
groups of studies that used different meth- 
odologies or employed distinctly different sub- 
ject groups. Finally, findings were much the 
same for studies published in different ways 
and at different times. 


Coaching for Other Tests 


Seventeen of the 24 studies of coaching for 
aptitude tests other than the SAT used a pre- 
test-posttest design, and 14 of these studies 
reported pretest and posttest averages sepa- 
rately for the experimental and control groups. 
Improvement from pretest to posttest averaged 
0.76 standard deviations for the experimental 
groups arid 0.25 standard deviations for the 
control groups. The effect of coaching alone, 
estimated from these studies, was therefore. 
equal to 0.76 — 0.25, or 0.51 standard devia- 
tions. The seven studies that did not use pre- 
tests yielded a significantly lower estimate of 
the size of coaching effects; scores of coached 
and uncoached groups in these studies differed 
on the average by 0.27 standard deviations. 
On the basis of all 24 studies, we estimated 
the average ES of coaching to be 0.43. It is 


statistically unlikely that an effect as large as 
0.43 would be found if the literature reported 
no overall effect of coaching on these tests, 
(23) = 8,24, p < .001. 

The use of a pretest in the experimental 
design turned out to be the only study feature 
significantly related to size of effect (Table 2). 
As in the studies of coaching for the SAT, ESs 
were similar for coaching programs of different 
types: long and short programs, programs with 
different characteristics, and programs devel- 

‘oped under different auspices. Findings were 
also similar for groups of studies that used 
different methodologies or that employed dif- 
ferent subject groups. Finally, findings were 
much the same for studies published in dif- 
ferent ways and at different times. 


Discussion 


This meta-analysis showed that there are 
two distinct literatures on the effectiveness of 
coaching programs. One of these examines 
coaching. effects on the SAT. Its studies tend 
to involve long-term coaching by commercial 
schools, field-tested coaching programs, testing 
for real-life education decisions, and students 
at higher grade levels. The other literature ex- 
amines coaching effects on other tests. Its 
studies tend to focus on short-term programs 
offered in school settings, newly developed 
programs, and posttest-only research designs. 
The literature on the SAT generally reports 
small effects from coaching; the literature on 


186 


other tests shows that coaching programs can 
have substantial effects. 

The small SAT effects should come as no 
surprise to anyone familiar with recent reviews 
of findings on coaching. In 1968, the College 
Entrance Examination Board reported that the 
net result across all studies of special prepa- 
ration for the SAT was a score gain of fewer 
than 10 points (or about 0.1 standard devia- 
tions). In their comprehensive review of find- 
ings on SAT coaching, Messick and Jungeblut 
(1981) also reported small overall effects of 
coaching on SAT scores. In the typical con- 
trolled study, the average gain attributable to 
coaching was 14.3 points for the verbal section 
and 15.1 points for the mathematics section. 
Even Slack and Porter’s (1980) review, which 
stressed the effectiveness of coaching programs 
for the SAT, reported median gains from 
coaching of only 16 points on the verbal section 
of the SAT and 15 points on the mathematics 
section. 

Although most SAT studies showed almost 
no effect from coaching, a well-designed study 
by Evans and Pike (1973) reported sizable ef- 
fects. Although this study did not differ from 
other studies in features that we examined in 
our formal analysis, further examination 
showed that the study did have a unique char- 
acteristic. It was carried out by ETS researchers 
who were thoroughly familiar with the SAT 
item pool and who developed special coaching 
materials for specific item types included in 
this item pool. Other coachers have not been 
as familiar with SAT items because ETS se- 
curity policies have until recently put SAT test 
forms out of their reach. Recent changes in 
ETS policies give the public much more access 
to SAT items and test forms, and it seems 
possible, therefore, that future SAT coaching 
programs will prove to be more successful than 
past programs. 

Reviewers have speculated that program 
duration can explain some of the variation in 
outcome of studies on SAT effectiveness (Mes- 
sick & Jungeblut, 1981; Slack & Porter, 1980). 
They have reported that effective coaching 
programs are long in duration, whereas inef- 
fective programs are short. Messick and Jun- 
geblut (1981) developed regression equations 
relating program length to gain attributable 
to coaching. Their regression lines fit their data 
closely. For the verbal part of the SAT, the 


J. KULIK, R. BANGERT-DROWNS, & C.-L. KULIK 


correlation between predicted and actual study 
effects was .70 (based on 17 studies); for the 
mathematics part, the correlation was .59 
(based on 22 studies). 

Our meta-analysis did not disclose any sub- 
stantial correlation between program duration 
and SAT effects. In fact, we found that effects 
were nearly the same for long and short coach- 
ing programs. We suspect that the type of study 
included in the analyses can explain the dif- 
ference between Messick and Jungeblut’s 
(1981) findings and our own. We included in 
our sample only controlled studies of coaching 
programs; Messick and Jungeblut included in 
their sample two uncontrolled studies of what 
is ordinarily considered regular school in- 
struction (Pallone, 1961; Marron, 1965). These 
two studies of school instruction dramatically 
affected the placement of regression lines by 
Messick and Jungeblut. Other factors that may 
explain Messick and Jungeblut’s findings are 
their fitting of regression lines to twice as many 
data points as they had independent studies 
and their eliminating “questionable” data 
points from their analyses. 

Findings from coaching programs for other 
aptitude tests were similar to findings presented 
by Vernon (1954). According to Vernon, prac- 
tice and coaching raised aptitude scores by 
about 0.6 standard deviations (or 8 to 9 points 
on an IQ scale). Our finding for aptitude tests 
other than the SAT indicated an average gain 
from practice and coaching of 0.76 standard 
deviations. It is important to note, however, 
that this was an estimate of the combined effect 
of coaching plus practice on a pretest. The 
effect attributable to coaching alone was 0.4 
standard deviations. 

Studies that used a pretest yielded larger 
estimates of pure coaching effects than did 
other studies. In studies with a pretest, effects 
attributable to coaching averaged 0.51 stan- 
dard deviations. In studies without pretests, 
effects of coaching averaged 0.27 standard de- 
viations. It seems possible that the pretest acted 
to sensitize the students to the information 
presented in the coaching program. If so, a 
pretest may be an important component in 
any program designed to prepare students for 
aptitude tests. 

We were not able to find other factors that 
influenced study results. Although this failure 
was disappointing, it was not unexpected, After 


APTITUDE COACHING 


examining results from numerous meta-anal- 
yses, Glass, McGaw, and Smith (1981) con- 
cluded reluctantly that the findings of contem- 
porary research in the social sciences often fit 
together poorly and that variation in study 
findings is only modestly predictable from 
study characteristics. The results of our meta- 
analysis support this conclusion, Even with 
the use of objective tools for synthesis of find- 
ings, it was impossible to explain fully why 
coaching results differ from study to study as 
extensively as they do. 


References 


Alderman, D. L., & Powers, D. E. (1980). The effects of 
special preparation on SAT-verbal scores. American Ed- 
ucational Research Journal, 17, 239-253. 

Anastasi, A, (1981). Coaching, test sophistication, and de- 
‘veloped abilities, American Psychologist, 36, 1086-1093, 

Bernal, E. M., Jr. (1971). Concept learning among Anglo, 
Black, and Mexican American children using facilitation 

- Strategies and -bilingual techniques. Dissertation. Ab- 
stracts International, 32, 6180A. (University Microfilms 
No, 72-15,707) 

Boger, J. H. (1952), An experimental study of perceptual 
training on group IQ test scores of elementary pupils 
in rural ungraded schools. Journal of Educational Re- 
search, 46, 43~52. 

Casey, M. L., Davidson, H. P., & Horter, D. I. (1928). 
Three studies on the effect of training in similar and 
identical material upon Stanford-Binet test scores. 
Twenty-seventh Yearbook of the National Society for the 
Study of Education, 1, 431-439, 

Coaching daze (FTC vs. Kaplan). (1979, June). Time, p. 
57. 

College Entrance Examination Board. (1968). Effects of 
coaching on Scholastic Aptitude Test scores. New York: 
College Entrance Examination Board. (ERIC Document 
Reproduction Service No. ED 169 130) 

Evans, F, R. (1977), The GRE-Q Coaching/Instruction 
Study. Princeton, NJ: Graduate Record Examinations, 
Educational Testing Service. (ERIC Document Repro- 
duction Service No. ED 163 088) 

Evans, F. R., & Pike, L. W. (1973). The effects of instruction 
for three mathematics item formats. Journal of Edu- 
cational Measurement, 10, 257-272. 

Federal Trade Commission, Bureau of Consumer Protec- 
tion, (1979). Effects of coaching on: standardized ad- 
mission examinations: Revised statistical analyses of 
data gathered by Boston Regional Office of the Federal 
Trade Commission. Washington, DC: Federal Trade 
Commission, Bureau of Consumer Protection. (NTIS 
No, PB-296 196) : 

Flynn, J. T., & Anderson, B. E. (1977). The effects of test 
item cue sensitivity on IQ and achievement test per- 
formance. Educational Research Quarterly, 2(2), 32- 
39. 

) Frankel, E. (1960). Effects of growth, practice, and coaching 
on Scholastic Aptitude Test scores, Personnel and Guid- 
ance Journal, 38, 713-719. : 

French, J. W. (1955). The coachability of the SAT in public 


187 


Schools (RB 55-26). Princeton, NJ: Educational Testing 
Set vice. . 

French, J. W., & Dear, R. E. (1959). Effects of coaching 
on an aptitude test. Educational and Psychological 
Measurement, 19, 319-330. 

Gilmore, M. F. (1927). Coaching for intelligence tests, 
Journal of Educational Psychology, 18,.119-121. 

Glass, G. V. (1976). Primary, secondary, and meta-analysis 
of-research. Educational Researcher, 5, 3-8. 

Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta- 
analysis in social research. Beverly Hills, CA: Sage Pub- 
lications. 

Goldsmith, R. P. (1980). The effects of training in test 
taking skills and test anxiety management on Mexican 

. American students’ aptitude test performance. Disser- 
tation Abstracts International, 40, ST90A. (University 
Microfilms No, 80-09863) 

Greene, K. B. (1928). The influence of specialized training 

__ on tests of general intelligence. Twenty-seventh Yearbook 
of the National Society for the Study of Education, 1, 
421-428, 

Holloway, H. D. (1954), Effects of training on the SRA 
Primary Mental Abilities (Primary) and the WISC. Child 
Development,.25, 253-263, 

Jefferson, J. L. (1975). The effects of anxiety on the 
achievement of black graduate students taking stan- 
dardized achievement tests. Dissertation Abstracts In- 
ternational, 35, 5121A. (University Microfilms No, 75- 
3105) : 

Keefauver, L. W. (1977). The effects of a program of 
coaching on Scholastic Aptitude Test ‘scores of high 
school seniors pretested as juniors. Dissertation Abstracts 
International, 37, 5063A. (University Microfilms No. 
77-3651) : 

Keysor, R. E. (1977). The effect of test wiseness on profes- 
sional school screening test scores. Dissertation Abstracts 
International, 37(9-B), 4652, (University Microfilms No. 
77-4834) 

Kintisch, L. S. (1979). Classroom techniques for improving 
Scholastic Aptitude Test scores, Journal of Reading, 22, 
416-419, 

Klutch, M. I. (1976). The influence of test sophistication 
on standardized test scores. Dissertation Abstracts In- 
ternational, 37, 809A. (University Microfilms No. 76- 
19,058) 

Lent, R. W., & Russell, R. K. (1978). Treatment of test 
anxiety by cue-controlled desensitization and study-skills 
training. Journal of Counseling Psychology, 25, 217- 
224, 

Lewis, L. A., & Kuske, T. T. (1978). Commercial national 
board review programs: A case study at the Medical 
College of Georgia. Journal of the American Medical 
Association, 240, 754-755. 

Marron, J. E. (1965). Preparatory school test preparation: 
Special test preparation, its effect on College Board scores 
and the relationship of affected scores to subsequent 
college performance. West Point, NY: Research Division, 
Office of the Director of Admissions and Registrar, 
United States Military Academy. (ERIC Document Re- 
production Service No. ED 187 764) 

Melametsa, L. (1965). The influence of training on the 
level of test performance and the factor structure of 
intelligence tests. Scandinavian Journal of Psychology, 
6, 19-25. : 


188 


Merriman, C. (1927). Coaching for mental tests. Educa- 
tional Administration and Supervision, 13, 59-64. 

Messick, S., & Jungeblut, A. (1981). Time and method in 
coaching for the SAT. Psychological Bulletin, 89, 191- 
216. 

Moore, J. C. (1971). Testwiseness and analogy test per- 
formance, Measurement and Evaluation in Guidance, 
3, 198-202. 

Oakland, T. (1972). The effects of test-wiseness materials 
on standardized test performance of preschool disad- 
vantaged children. Journal of School Psychology, 10, 
355-360. 

Pallone, N. J. (1961). Effects of short-term and long-term 
developmental reading courses upon S.A.T. verbal scores. 
Personnel and Guidance Journal, 39, 654-657. 

Petty, N. E., & Harrell, E. H. (1977). Effect of programmed 
instruction related to motivation, anxiety, and test wise- 
ness on Group IQ test performance. Journal of Edu- 
cational Psychology, 69, 630-635. 

Pike, L. W. (1979). Short-term instruction, testwiseness, 
and the Scholastic Aptitude Test: A literature review with 
recommendations. New York, NY: College Entrance Ex- 
amination Board, 

Rayford, O, L. (1973). An experimental study of the effects: 
of three modes of test orientation on scholastic aptitude 
and achievement scores. Dissertation Abstracts Inter- 
national, 33, 6099A. (University Microfilms No. 73- 
12,646) 

Roberts, S. O., & Oppenheim, D, B. (1966). The effect of 
special instruction upon test performance of high school 


J. KULIK, R. BANGERT-DROWNS, & C.-L. KULIK 


students. Princeton, NJ: Educational Testing Service. 
(ERIC Document Reproduction Service, No. ED 053 
158) 

Rutan, P. C. (1979). Test sophistication training: A program 
level intervention for the school psychologist. Disser- 
tation Abstracts International, 40, 171A. (University 
Microfilms No. 79-14135) 

Slack, W. V., & Porter, D. (1980). The Scholastic Aptitude 
Test: A critical appraisal. Harvard Educational Review, 
50, 154-175. 

Trainor, J. C. (1939). Experimental results of training in 
genera] semantics upon intelligence test scores, Papers 
from the First American Congress on General Seman- 
tics—Ellensberg, Washington. New York, NY: Arrow 
Editions. 

Vernon, P. E. (1954). Practice and coaching effects in in- 
telligence tests. Educational Forum, 18, 269-280. 

Whitely, S. E., & Dawis, R. V. (1974). Effects of cognitive 
intervention on latent ability measured from analogy 
items. Journal of Educational Psychology, 66, 710-717. 

Whitla, D. K. (1962). Effect of tutoring on Scholastic Ap- 
titude Test scores. Personnel and Guidance Journal, 41, 
32-37. 

Wiseman, S., & Wrigley, J. (1953). The comparative effects 
of coaching and practice on the results of verbal intel- 
ligence tests. British Journal of Psychology, 44, 83-94. 


Received December 13, 1982 
Revision received August 11, 1983 = 


